Performance Analysis of Application Kernels in Multi/Many-Core Architectures
نویسندگان
چکیده
In recent years, advancement in technology and computing led to huge amounts of data being generated. Thus, HighPerformance Computing (HPC) plays an ever growing role in processing these large datasets in a timely fashion. Our analysis consist of few important throughput computing app kernels which have high degree of parallelism and makes them excellent candidates for evaluation on high end multi-core CPUs, and manycore GPUs. In this work, we performed a performance comparison of important app kernels like Image Convolution, Histogram and Bilateral filtering in multi-core CPU, many-core NVIDIA GPUs in addition to comparing our research framework GPU enabled ManyTask Computing (GeMTC). GeMTC is an execution model and runtime system which enables NVIDIA GPUs to be programmed with many concurrent and independent tasks of potentially short or variable duration. In this work we provide a thorough performance analysis between CPU, CUDA, and the GeMTC framework. Through this we better understand the behavior of different applications that belong to the Many-Task Computing paradigm. The results show that the GeMTC framework shows promising results for Many-Task Computing workloads running on NVIDIA GPUs.
منابع مشابه
AutoMatch: Automated Matching of Compute Kernels to Heterogeneous HPC Architectures
HPC systems contain a wide variety of heterogeneous computing resources, ranging from general-purpose CPUs to specialized accelerators. Porting sequential applications to such systems for achieving high performance requires significant software and hardware expertise as well as extensive manual analysis of both the target architectures and applications to decide the best performing architecture...
متن کاملEfficient parallelization of the genetic algorithm solution of traveling salesman problem on multi-core and many-core systems
Efficient parallelization of genetic algorithms (GAs) on state-of-the-art multi-threading or many-threading platforms is a challenge due to the difficulty of schedulation of hardware resources regarding the concurrency of threads. In this paper, for resolving the problem, a novel method is proposed, which parallelizes the GA by designing three concurrent kernels, each of which running some depe...
متن کاملA portable and high-performance matrix operations library for CPUs, GPUs and beyond
High-performance computing systems today include a variety of compute devices such as multi-core CPUs, GPUs and many-core accelerators. OpenCL allows programming different types of compute devices using a single API and kernel language. However, there is no standard matrix operations library in OpenCL for operations such as matrix multiplication that works well on a variety of hardware from mul...
متن کاملPerformance Analysis and Optimisation of the OP2 Framework on Many-core Architectures
This paper presents a benchmarking, performance analysis and optimisation study of the OP2 “active” library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targetin...
متن کاملPerformance Analysis and Optimization of the OP2 Framework on Many-Core Architectures
This paper presents a benchmarking, performance analysis and optimization study of the OP2 ‘active’ library, which provides an abstraction framework for the parallel execution of unstructured mesh applications. OP2 aims to decouple the scientific specification of the application from its parallel implementation, and thereby achieve code longevity and near-optimal performance through re-targetin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014